Deep metagenomic sequencing unveils novel SAR202 lineages and their vertical adaptation in the ocean

SAR202 bacteria in the Chloroflexota phylum are abundant and widely distributed in the ocean. Their genome coding capacities indicate their potential roles in degrading complex and recalcitrant organic compounds in the ocean. However, our understanding of their genomic diversity, vertical distribution, and depth-related metabolisms is still limited by the number of assembled SAR202 genomes. In this study, we apply deep metagenomic sequencing (180 Gb per sample) to investigate microbial communities collected from six representative depths at the Bermuda Atlantic Time Series (BATS) station. We obtain 173 SAR202 metagenome-assembled genomes (MAGs). Intriguingly, 154 new species and 104 new genera are found based on these 173 SAR202 genomes. We add 12 new subgroups to the current SAR202 lineages. The vertical distribution of 20 SAR202 subgroups shows their niche partitioning in the euphotic, mesopelagic, and bathypelagic oceans, respectively. Deep-ocean SAR202 bacteria contain more genes and exhibit more metabolic potential for degrading complex organic substrates than those from the euphotic zone. With deep metagenomic sequencing, we uncover many new lineages of SAR202 bacteria and their potential functions which greatly deepen our understanding of their diversity, vertical profile, and contribution to the ocean’s carbon cycling, especially in the deep ocean.

utilization of other potential nitrogen sources such as hypotaurine and taurine in the deep sea 14,16 .
SAR202 bacteria have been divided into seven groups (I to VII) based on the genomic phylogeny, with six subgroups (Ia-c and IIIa-c) in groups I and III 13,15,18 .These different groups and subgroups occupy different niches of the water column in the ocean 15,16 .Current culturable SAR202 bacterial strains are all from subgroup Ia, only representing a small subset of SAR202 genotypes 12 .SAR202 bacteria are widely distributed throughout the water column of the ocean, ranging from the surface to the deep ocean trenches 6,15,18 .They become relatively more abundant in the deeper ocean compared to the surface water 6,14 .Previous metagenomic studies rarely exceeded 20 Gb per sample in the sequencing depth of seawater samples 13,[19][20][21] .Considering the diversity of SAR202 in the ocean, this sequencing coverage may not be high enough to recover low abundant SAR202 genomes.Deeper sequencing coverage has the potential to explore new taxa and construct more high-quality metagenomes, providing a comprehensive understanding of the metabolic and ecological functions of microbes in the ocean.
The current genome taxonomy database (GTDB) database contains approximately 400 representative SAR202 metagenome-assembled genomes (MAGs), 92.5% mainly derived from the upper ocean samples (above the oxygen minimum zone, OMZ), such as Tara ocean samples 22 .Although SAR202 bacteria can make up a significant part of the deep ocean microbial community, the number of available MAGs from the deep ocean is still limited compared to the upper ocean (below OMZ).
In this study, we applied deep metagenomic sequencing (180 Gb per sample) to analyze six depth samples at the BATS station.Our sequencing depth surpasses that of most previous studies by at least 10-fold.Our study greatly expands the number of SAR202 genomes or MAGs.It enables us to identify novel SAR202 groups/subgroups in the ocean, which could provide sufficient SAR202 genomes to understand their vertical distribution, metabolic diversity, and unique ecological niches in the ocean.

Results and Discussion
Deep metagenomic sequencing recovered more MAGs per sample The vertical physicochemical profile of the BATS sampling site is shown in Fig. 1.Briefly, the chlorophyll a concentration reached the maximum (0.74 mg/m 3 ) at 106 m depth.Dissolved oxygen reached its minimum (136 µmol/kg) at 805 m depth, and increased to 245-249 µmol/kg in the bathypelagic zone.Temperature declined progressively from 28.3 °C to 3.7 °C within the first 2000 m depth and stabilized at approximately 3 °C from 2000 to 4500 m depth.Salinity decreased from 36.7 to 34.8 PSU from the surface to deeper waters (Fig. 1).Deep metagenome sequencing was performed on six samples collected at six different depths (M1-M6) at the BATS station.At least 2 Gb assembly contigs ( > 2,000 bp) were obtained for each sample (Table 1).
Binning of these metagenome data yielded a total of 1248 nonredundant medium/high-quality MAGs (Completeness > 50%, Contamination < 10%, ANI < 95%) (Table 1, Supplementary data 1).On average, 208 MAGs were obtained per sample.The Tara Oceans study yielded 2631 MAGs from 234 samples (averaging 11 MAGs per sample) 23 .The Malaspina expedition recovered 236 MAGs from 58 bathypelagic samples (averaging 4 MAGs per sample) 19 .The sequencing depth of our BATS metagenome (180 Gb per sample) was significantly higher than that of the Tara Ocean ( ~30 Gb per sample) and Malaspina database ( ~3.4 Gb per sample).The deep metagenomic sequencing applied in this study enabled us to assemble a high number of MAGs.It was a common practice for earlier studies to sequence microbial metagenomes with a sequencing capacity of 10-20 Gb per sample.However, this sequencing depth is not sufficient to recover rare species in the microbial community 24,25 .Our study used the traditional short-read shotgun metagenomics with a capacity of 180 Gb per sample, which is ca.10-fold higher than most of the earlier studies.Such a sequencing depth allowed us to identify more novel microbial species.
The number of recovered MAGs gradually increased with depth at the BATS station (from 73 MAGs at the surface to 360 MAGs at 4535 m depth) (Table 1).The lower recovery of MAGs in the upper ocean could be related to the vertical distribution of microbes in the ocean.It has been known that microbial diversity and abundance generally decrease from the surface to the deep ocean 26,27 .It will require a higher sequencing depth to assemble more high-quality MAGs in the upper ocean than in the deep ocean.Except for two MAGs identified in the DCM layer, all Chloroflexota MAGs were recovered from the dark ocean, ranging from 805 to 4535 m.Specifically, 26 Chloroflexota MAGs were found at 805 m, 40 at 2000 m depth, 72 at 2373 m depth, and 74 at 4535 m depth (Table 1), reflecting their increasing abundance towards the deeper ocean 14,15,28 .
A high proportion of unclassified species and genera in the MAGs recovered from BATS Among the 1248 BATS MAGs, 1172 bacterial and 76 archaeal MAGs were recovered, as shown in Fig. 2A, B. Chloroflexota has the highest number of MAGs (217 MAGs), followed by Planctomycetota (205 MAGs), Alphaproteobacteria (140 MAGs), Gammaproteobacteria (119 MAGs), and Acidobacteriota (97 MAGs).Notably, 83% of these MAGs represent novel species, and 47% are attributed to previously unidentified genera (Fig. 2C).Interestingly, 91% of the 217 Chloroflexota MAGs are novel species and 64% are new genera (Fig. 2D), suggesting a large proportion of Chloroflexota in the ocean remains unexplored.
The phylogenomic analysis of the Chloroflexota reveals that SAR202 is a deeply branched monophyletic group that radiates within the Chloroflexota, and SAR202 is a sister group next to Dehalococcoidales (Fig. 3), which is consistent with prior identifications of SAR202 bacteria 15 .Notably, 173 MAGs recovered from BATS were assigned to the SAR202 clade (Fig. 3), in which 154 were classified as new species, 104 as new genera, 48 as new families, and one as new order (Fig. 4A).Our data greatly expanded the phylogenomic tree of SAR202, particularly within the lesser abundant groups IV, V, VI, and VII (Fig. 4B), indicating that deep metagenome sequencing used in this study substantially augments the diversity of SAR202 bacteria in the ocean.

Phylogenomic diversity of SAR202
Seven groups (I-VII) of SAR202 bacteria have been reported in earlier studies and correspond to different GTDB orders 12,15 .Interestingly, eight of our SAR202 MAGs do not belong to these seven SAR202 groups.Instead, they fell into five distinct branches between SAR202 group V and Dehalococcoidales (Fig. 5).Each of these five branches is associated with a unique GTDB order name, o_SHYM01, o_JACPQK01, o_Plut-88900, o_SHYB01, and o_UBA6926 (Supplementary data 2), suggesting the presence of unclassified SAR202 members.This Unclustered SAR202 group appears to emerge earlier than the seven known SAR202 groups (Fig. 5).The taxonomic and evolutionary position of these Unclustered SAR202 genomes remains to be confirmed when more genome sequences become available.
Our SAR202 MAGs covered all the known SAR202 groups and subgroups, except for SAR202 group Ic (Fig. 5).We added two new subgroups (Id and Ie) to group I and one new subgroup (IIId) to group III (Fig. 5).
Groups I and III each only contained 3 subgroups (Ia-Ic and IIIa-IIIc) previously 15 .We first divided SAR202 group II into six subgroups (IIa-IIf) and subgroups IId and IIe mainly contain our MAGs recovered from BATS.Two new lineages were added to groups VII and IV, respectively, and they include two GTDB orders (o_ SAR202-VII-2 and o_GCA-2717565).

Vertical distribution of SAR202 bacteria in the global ocean
The PCA analysis illustrates that the SAR202 community contains distinct clusters corresponding to the major depths which include the euphotic (SRF and DCM), mesopelagic, and bathypelagic zones (Fig. 6A).Our study indicated that the SAR202 community varies with depth  in the world ocean, which is consistent with previous studies in the marine trenches and Caspian Sea water column [14][15][16] .We plotted the occurrence of 20 newly defined SAR202 subgroups at four major ocean depths (surface, DCM, mesopelagic, and bathypelagic) (Fig. 6B).SAR202 subgroups Id (average 4.1 TPM), Ie (average 1.9 TPM), IIc (average 3.5 TPM), IId (average 3.3 TPM), IIe (average 4.9 TPM), IIf (average 4.5 TPM), IIIa (average 17.5 TPM), IIIc (average 4.5 TPM), and IIId (average 1.7 TPM) are relatively more abundant in the deeper ocean (below 800 m) compared to their abundance in the euphotic ocean which has TPM ranging between 0.001 to 0.9.The abundance of subgroups Ib (average 54.9 TPM), Ic (average 60.9 TPM), IIa (average 12.9 TPM), IIb (average 10.6 TPM), and IIIb (average 5.8 TPM) within group I, II, and III are more prevalent in the photic zone ( > 200 m depth) than that in dark ocean (TPM ranging from 3.2 to 9.2) (Fig. 6B).Group I, II, and III are dominant SAR202 in the ocean 14,15 .An earlier study reported that the SAR202 group I dominates the euphotic ocean 15 .However, we found that some group I subgroups (i.e.Id and Ie) are present in the deep ocean, suggesting that niche partitioning can be different at the subgroup level.Except for subgroup IIIb within group III, most group III subgroups are abundant in the deep ocean, which is consistent with the previous study 15 .
The Unclustered SAR202 group is more prevalent in the mesopelagic and bathypelagic ocean than in the euphotic ocean.SAR202 groups IV-VII are present throughout the whole water column and showed less distinguishable vertical patterns compared to groups I-III (Fig. 6).SAR202 groups IV, VI, and VII are abundant in the euphotic water, suggesting that they are more prevalent in the euphotic zone.Such distribution patterns of SAR202 bacteria reflect the ecological diversity and adaptation strategies of microbial life in response to varying environmental factors such as light, temperature, pressure, and nutrient availability within different ocean depths.

Genomic characteristics of SAR202 groups/subgroups
We chose 124 high-quality SAR202 genomes with over 90% completeness from all 471 genomes to analyze their genomic characteristics of groups/ subgroups (Supplementary data 3).These genomes consisted of 84 GTDB genomes and 40 BATS genomes, and they covered all known SAR202 groups/subgroups.The deep ocean SAR202 subgroups (Id, Ie, IId, IIe, IIf, IIIa, IIIb, IIIc, IIId) encode ~1000 more ORFs than the euphotic subgroups (Ib, Ic, IIa, IIb) (Fig. 7A), suggesting that these SAR202 subgroups in the euphotic ocean may have smaller genome sizes (with fewer genes) compared to those SAR202 subgroups in the deep ocean.Notably, SAR202 subgroup IIIb tends to have a wide range of ORF numbers (Fig. 7A) and is widely distributed in the ocean water column compared to the other subgroups of group III (Fig. 6).
Based on the frequency of KO in each high-quality SAR202, NMDS analysis was used to explore the functional composition similarity of each SAR202 genome (Fig. 7B).Our study shows that the functional composition varies between SAR202 groups/subgroups.A distinct separation in SAR202 groups I, II, III, and other SAR202 groups is evident (Fig. 7B), indicating a functional difference between these SAR202 groups.The function similarity of SAR202 group III is distinct from SAR202 groups I and II based on the gene composition (Fig. 7B), reflecting their distant phylogenomic relationships (Fig. 5).Different groups of SAR202 may contain specific genes needed based on their adaptative natures.For example, a previous study found that the FNNOs genes only appear in SAR202 III, while the enolase genes are widely present in group I 15 .

Metabolic difference of SAR202 in different depths
To elucidate the functional disparities of SAR202 across varying ocean depths, we analyzed 31 genomes out of 124 high-quality SAR202 genomes (Supplementary data 5).These 31 genomes represent relatively more abundant SAR202 groups in the ocean because their average relative abundance is higher than 10 TPM.These genomes were identified in different ocean depths, including eight genomes from the euphotic, ten from the mesopelagic, and thirteen from the bathypelagic zones.Notably, the relative abundance of these genomes varies significantly with depth in the BATS water column (Fig. 8A), underscoring their suitability to represent the change of SAR202 bacteria in different ocean depths.SAR202 bacteria in the dark ocean exhibit more complex metabolic functions compared to those in the euphotic zone, particularly in the degradation of aromatic compounds (Fig. 8B).Genes associated with the degradation of substances such as catechol, toluene, trans-cinnamate, phthalate, and polyaromatic hydrocarbons including dioxin are prevalent among deep ocean SAR202 bacteria, yet are absent in their euphotic counterparts (Fig. 8B).Our study indicated that deep ocean SAR202 bacteria have notably potential for degrading complex dissolved organic carbon (Fig. 8B).This is consistent with the previous finding that SAR202 bacteria derived from the marine trench and dark ocean are prone to degrade refractory dissolved organic carbon (RDOC) 13,16,17 .More than 95% of DOC in the deep ocean water is RDOC, and it was previously suggested that it can remain in the deep ocean for hundreds to thousands of years 29,30 , although the age of deep-ocean DOM is currently debated 31 and RDOC turnover times are unknown.The fact that deep ocean SAR202 bacteria have the capability to break down RDOC implies that the RDOC pool in the dark ocean is subjected to bacterial degradation further fueling the debate on RDOC turnover times.Although we do not know the actual degradation rate of RDOC by SAR202, it is plausible that SAR202 bacteria could play an active role in the turnover of the ocean's RDOC considering their genomic versatility for degrading RDOC.
Deep ocean SAR202 bacteria exhibit enhanced capabilities for synthesizing a wider array of amino acids and cofactors/vitamins compared to their euphotic counterparts (Fig. 8B).Notably, pathways such as leucine degradation, methionine salvage, polyamine biosynthesis, siroheme biosynthesis, heme biosynthesis, and cobalamin biosynthesis are prevalent in the deep ocean SAR202, yet are absent in the euphotic SAR202 (Fig. 8B).In contrast, the general L-amino acid transport system is commonly observed in the euphotic SAR202 but is rare in the deep ocean SAR202, suggesting that the utilization of amino acids directly from seawater could be important to SAR202 in the surface ocean.Interestingly, the cobalamin biosynthesis is enriched in SAR202 group III (Fig. 8B), a phenomenon also seen in SAR202 in the Mariana Trench 16,18 .It appears that all SAR202 bacteria have the potential to assimilate ammonium and urea and be involved in the reduction of thiosulfate to hydrogen sulfide (Fig. 8B).In addition, SAR202 bacteria in the deep ocean have the potential to assimilate sulfate, utilize organic sulfur such as alkanesulfonate, and oxidize sulfite (Fig. 8B, Supplementary data 6).Together, these genomic features suggest that SAR202 bacteria can be important in the ocean's sulfur cycling.SAR202 bacteria in the bathypelagic ocean layers have the potential to utilize multiple organosulfur compounds and oxidize sulfite 14 .Sulfite oxidation can generate ATP and thus provide an essential energy source for SAR202 in the Ionian Sea 3500 m and Mariana Trench 14,18 .Some deep ocean SAR202 bacteria show the potential for using phosphonate (Fig. 8B), suggesting a metabolic adaptation for utilizing organic phosphorus in the deep ocean.
SAR202 bacteria in the euphotic ocean (subgroups Ia, Ib, IIa, and IIb within groups I and II) encode the bacteriorhodopsin-like genes (Fig. 8B).Bacterioplankton in sunlit oceanic regions commonly possess the proteorhodopsin gene, facilitating additional energy production through a light-driven proton pump 32 .Previous research has confirmed the presence of the proteorhodopsin gene in SAR202 strains retrieved from waters shallower than 150 meters 15 , suggesting the critical role of photic energy utilization in these SAR202 bacteria.Moreover, the predicted galactonate dehydratase (dgoD) gene, a member of the COG4948 paralogs, is prevalent in SAR202 group I (at least 12 dgoD genes per genome), which is far more abundant than other SAR202 groups (Supplementary data 6).This gene cluster is abundant in cultured SAR202 strains from group Ia, known for their capacity to metabolize various carbohydrates 12 .In the euphotic ocean, phytoplankton release polysaccharides which can be rapidly assimilated by bacteria 30,31 .We hypothesize that there is a close ecological interaction between SAR202 bacterioplankton and phytoplankton in the photic zone.
It is noteworthy that the three CO dehydrogenase genes (coxS, coxM, and coxL) are widely present in the deep water SAR202 bacteria, while SAR202 bacteria in the photic zone only contain the coxS and coxM genes but not the coxL gene (Fig. 8B).It has been reported that CO oxidation provides energy which supports microbial growth and survival in the ocean 33 .The cox genes have been found in Chloroflexota 34 and SAR202 13 .The coxL gene is the large catalytic subunit of dehydrogenase genes.It would be interesting to see if the surface SAR202 bacteria lose the CO oxidation function since they do not encode the coxL gene.

Conclusion
Deep metagenomic sequencing at the BATS water column has revealed substantial insights into the genomic diversity and metabolic capabilities of SAR202 bacteria across different ocean depths.By recovering a significant number of MAGs, especially from the deeper ocean water, we expanded the phylogenetic diversity of marine SAR202 from 11 to 23 groups/subgroups and nearly doubled the number of SAR202 MAGs in the current metagenome database.We found that SAR202 bacteria (subgroups Id, Ie, IIc, IId, IIe, IIf, IIIa, IIIc, and IIId within groups I, II, and III) in the bathypelagic zone possess enhanced metabolic functions for degrading complex organic compounds and biosynthesizing essential amino acids and cofactors/ vitamins.Conversely, SAR202 bacteria (subgroup Ia, Ib, IIa, and IIb) in the euphotic zone harness light-driven processes and interact closely with phytoplankton.The SAR202 bacteria in the surface ocean likely utilize labile organic substrates produced by photosynthetic organisms.On the other hand, deep ocean SAR202 bacteria are more capable of degrading recalcitrant DOC, supporting the previous hypothesis that SAR202 bacteria have the potential to degrade more complex and resistant dissolved organic matter in the deep ocean 13,15 .This research not only highlights the ecological significance of SAR202 bacteria but also sets a foundation for future studies aimed at understanding their specific functions and interactions within marine ecosystems.

Sample and environmental data collection
Six samples were collected from different depths (4, 106, 805, 2000, 2373, and 4535 m depth) at the BATS station (31°40' N, 64°10' W) aboard the R/V Atlantic Explorer on August 5-11, 2019.These water samples were labeled M1 to M6, representing surface (M1), deep chlorophyll maximum (DCM) (M2), oxygen minimum zone (OMZ) (M3), and bathypelagic zone (M4 -M6).For each sample, 120 L of seawater was collected using Niskin bottles and prefiltered through a 3 μm pore-size polycarbonate membrane (142 mm in diameter, Pall) with a peristaltic pump.Subsequently, the filtrate was filtered through one 0.22 μm pore-size polycarbonate membrane (142 mm in diameter, Pall).The filters were stored in a −80 °C freezer during the cruise, shipped with liquid N 2 , and stored in a −80 °C freezer in the laboratory until DNA extraction.Microbial cells retained on the 0.22 μm filters (0.22-3 μm) were used for DNA extraction.The CTD profiles obtained environmental data such as temperature, salinity, oxygen, and fluorescence.

DNA extraction and sequencing
Microbial DNA was extracted from half of the 0.22 μm filter described above following a phenol-chloroform protocol 35 .200 ng of DNA for each sample was used to prepare the sequencing library.Shotgun sequencing (paired-end 2 × 150 bp) was performed using the Illumina HiSeq2000 platform at the Genome Resource Center, University of Maryland School of Medicine, and ca.180 Gb of raw data was obtained for each sample.The flowchart of the bioinformatics analysis is shown in Fig. S1.Trimmomatic 0.36 36 was used to remove low-quality reads (LEADING:10, TRAILING:10, SLID-INGWIN-DOW:4:20, MINLEN:70).The detailed information is shown in Table 1.

Analysis of Chloroflexota MAGs
Based on GTDB results, a total of 217 Chloroflexota MAGs were obtained from the BATS water column.The average abundance of Chloroflexota MAGs in each sample is calculated by taking the length-weighted average of the MAGs' contig abundances by salmon v 0.13.1 48 .Open reading frames (ORFs) of Chloroflexota MAGs were predicted by Prokka v1.14.6 49 .Predicted genes from each MAGs were annotated against the Kyoto Encyclopedia for Genes and Genomes (KEGG) and eggnog database 50,51 using the diamond v0.9.14 52 (E-value = 1 × 10 −6 ).

Phylogenetic tree analysis
The 217 BATS MAGs and all 1502 representative Chloroflexota genomes of the GTDB database (as of 15-Apr-2022) 22 were used to construct a phylogenomic tree, using 120 core genes which were identified using GTDB-Tk v0.1.3 47.These core genes were aligned and concatenated using the gtdbtk aline modules with default parameters (If a genome had a low number of markers identified, it will be excluded from the analysis at this step).IQ-Tree was used to infer single-gene phylogenies with the following parameters (-bb 1000) 53 .Resulting maximum likelihood phylogenetic trees were utilized to analyze the classification of 217 BATS Chloroflexota MAGs.The phylogenomic tree was visualized using iTOL 54 .The categories of MAGs were primarily determined by the placement of the genomes in the phylogenetic trees.We identified these novel SAR202 subgroups based on the observation that single branches contain more than three SAR202 genomes and ANI is lower than 70% when compared to nearby genomes.

Predicting heliorhodopsin genes
A total of 585 reference sequences of heliorhodopsin (HeR) were downloaded from NCBI.After manually examining these sequences, BLAST v2.12.0 55 was applied to build a HeR gene protein database (makeblastdb) to annotate predicated HeR protein.The predicted genes in each Chloroflexota MAGs were converted into protein sequences and mapped to the HeR gene protein database with blastp (E-value = 1 × 10 −6 ) to obtain the HeR gene information.
Evaluating the abundance of SAR202 bacteria in the oceans A total of 471 SAR202 genomes (including our 173 SAR202 MAGs and 298 downloading SAR202 genomes from GTDB) were used to investigate the vertical community structure of the SAR202 community in the world's ocean (0-4535 m depth).The raw sequences samples (0.2-3 μm) from the Tara Ocean 56 and all Malaspina samples 19 were downloaded from the European bioinformatics institute (EBI).These samples are mainly distributed in the Atlantic, Indian, and Pacific oceans-the detailed information is in Supplementary data 3. Salmon v 0.13.1 48 was applied to calculate the coverage of all SAR202 contigs in each sample.Coverage tables were acquired to assess the TPM (transcripts per million) abundance of each contig in each sample.The TPM abundance of each bin in each sample was calculated by taking the length-weighted average of the bins' contig abundances with script split_salmon_out_into_bins.py in the metawrap 44 .
Selection of high-quality SAR202 MAGs and comparative genomics CheckM v.1.0.7 45 was applied to evaluate the quality of all SAR202 genomes.We picked high-quality SAR202 genomes based on their high completeness ( > 90%) and low contamination ( < 10%).These genomes were used to compare genomic features between different groups/subgroups.We then further selected 31 SAR202 genomes that are abundant ( > 10 TPM) based on their occurrence frequency in the database of Tara Ocean, Malaspina, and BATS samples.These high abundance SAR202 genomes are selected to represent SAR202 genomes in different depths, such as euphotic, OMZ, and bathypelagic ocean.These genomes are annotated in the KEGG and eggnog database 50,51 .Metabolic comparison was performed based on the presence and absence of specific KEGG modules.

Statistics and analysis
All the calculations and plots were performed in an R environment (version 4.3.3).All bar and dot charts were plotted using the ggplot2 (version 3.5.0)package 57 .According to the TPM abundance of SAR202 in the Tara Ocean, Malaspina, and BATS samples, the vegan package (version 2.6-4) was used for the PCA (principal component analysis) analysis 58 .This shows the distribution of SAR202 bacteria in the world's oceans.High-quality genomes of SAR202 bacteria produced 3922 different KO (KEGG Orthology) via KEGG annotation.A table was created by calculating the frequency of KOs in each SAR202 genome, assigning a value of 0 if the genome lacked the corresponding KO.The NMDS (non-metric multidimensional scaling) analysis (using the bray-curtis dissimilarity index) was applied to analyze the genomic composition of SAR202 bacteria using the gene frequency data.The distance in NMDS represents the functional similarity of MAGs.

Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material.If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Fig. 2 |
Fig. 2 | The proportion of classified and unclassified taxa of bacterial, archaeal and Chloroflexota MAGs recovered from BATS based on the GTDB classification.The upper panel shows the number of BATS MAGs in bacteria (A) and archaea (B).The lower panel shows the GTDB classification in different taxonomic levels (species, genus, family, and order) from 1248 BATS MAGs (C) and 217 Chloroflexota MAGs (D).The unclassified taxa are presented in blue color, and the classified in red color.

Fig. 3 |
Fig. 3 | Phylogenomic classification of Chloroflexota based on a total of 1722 Chloroflexota genomes retrieved from the GTDB database (15-Apr-2022).The maximum likelihood tree was inferred from the concatenation of 120 proteins.The 217 Chloroflexota MAGs recovered from the BATS station were labeled in red, and all known representative Chloroflexota genomes were labeled in black.Different classes of Chloroflexota were shown in different colors.The detailed taxonomy is shown in Supplementary data 2.

Fig. 5 |
Fig.5| Phylogenomic classification of SAR202 bacteria.This tree is an expansion of the SAR202 branch in Fig.3.The seven known SAR202 groups were labeled with different colors.The unclustered SAR202 between SAR202 group V and Dehalococcoidales were labeled with different gray shades.

Fig. 6 |
Fig. 6 | Niche partitioning of SAR202 bacteria in different depths of oceans.The Principal Component Analysis (PCA) shows the clustering of SAR202 communities collected from four different depths (surface, DCM, mesopelagic, and bathypelagic water) (A).The relative abundance of SAR202 groups or subgroups at four different depths (surface, DCM, mesopelagic, and bathypelagic) of the world's oceans (B).Dots present the samples from Tara Oceans samples (from surface to 990 m) Malaspina deep samples (from 2150 to 4018 m), and BATS (from 4 to 4535 m).Four different colors of dots represent 4 different depths.

Fig. 7 |
Fig. 7 | Genetic information in different SAR202 groups/subgroups based on 124 high-quality ( > 90% Completeness) SAR202 genomes derived from GTDB and BATS samples.A The number of open reading frames (ORF) across SAR202 groups, and (B) Non-metric Multidimensional Scaling (NMDS) analysis of the genomic functional composition of various SAR202 groups/subgroups based on KEGG annotation.

Fig. 8 |
Fig. 8 | Metabolic characteristics of 31 dominant high-quality SAR202 bacteria in the vertical ocean.The relative abundance in the BATS water column and genome size (A), and selected metabolic functions (B) of the 31 selected high-quality SAR202 MAGs ( > 90% completeness) which represent different SAR202 groups (except for groups IV and VII).The detailed genome information is shown in Supplementary data 5.Samples from Tara Oceans, Malaspina, and the BATS station were utilized to determine the average abundance of each MAG.The MAGs from BATS were labeled in red.

Table 1 |
Deep metagenome sequencing information in the BATS station